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Abstract. We reformulate the problem of modularity maximization over the set of 
partitions of a network as a conic optimization problem over the completely positive 
cone, converting it from a combinatorial optimization problem to a convex continuous 
one. A semidefinite relaxation of this conic program then allows to compute upper 
bounds on the maximum modularity of the network. Based on the solution of the 
corresponding semidefinite program, we design a randomized algorithm generating 
partitions of the network with sub-optimal modularities. We apply this algorithm to 
several benchmark networks, demonstrating that it is competitive in accuracy with the 
best algorithms previously known. We use our method to provide the first proof of 
optimality of a partition for a real-world network. 
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1. Introduction 

A widely accepted measure of community structure in networks is the modularity. The 
modularity was introduced in [1] and is a scalar function defined on the set of partitions 
of the network. For a given partition, the modularity describes by how much the 
intra-community links of the network dominate the links between different communities. 
Hence the maximizer of the modularity function is the partition that is best describing 
the community structure of the network. 

However, the problem of maximizing the modularity function over the set of 
partitions is NP-hard [2], and in practice one has to employ algorithms which yield 
a suboptimal partition. A number of such algorithms have been proposed by different 
authors, e.g. [1], [3], [4], [5], [6], [7], [8], [9]. A comparison of some available algorithms has 
been conducted in [10]. However, none of these algorithms allows to judge the quality 
of the obtained solution, i.e. to tell how close the achieved value of the modularity is to 
the maximal one. 

In this contribution we propose an approach which yields both an upper bound on 
the maximum of the modularity function, and partitions with suboptimal values of the 
modularity. We use a formalism developed by Burer [11] to replace the discrete search 
space, namely the set of partitions of the network, by a compact convex set. This set can 
be described as a compact afline section of the completely positive cone [12] . Thus the 
combinatorial problem of modularity maximization is replaced by a convex continuous 
optimization problem, and the complexity arising from the large number of partitions 
to be tested translates to the difficulty of deciding membership in a convex set. Our 
key idea is to replace the difficult convex set by an overbounding approximation with 
an easy description, namely by a semidefinite representable [13] set. Maximizing the 
modularity over this overbounding approximation amounts to a semidefinite program, 
for which numerical solvers are available, and yields an upper bound on the optimal 
value of the modularity. 

We give a simple geometrical interpretation of the employed approximation, which 
allows to design a randomized method for generating suboptimal partitions from the 
maximizer of the semidefinite program. This geometric approach serves also to improve 
the proposed relaxation further. These ideas can be considered as a generalization of 
the semidefinite approach developed by Goemans and Williamson [14] for the max-cut 
problem in combinatorial optimization. 

We test the algorithm on several benchmark networks and show that it is among the 
most accurate methods available in the literature. The upper bounds on the maximal 
modularity are in general within a few percent of the achieved suboptimal values. 
For the Zachary karate club network [19], the improved version of the semidefinite 
relaxation actually closes the gap between the upper bound and the achieved value of 
the modularity, thus furnishing an optimality certificate for the obtained partition. To 
our knowledge, this is the first proof of optimality for a partition of a real- world network. 

The principal drawback of the algorithm is the large computational effort, which 
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limits the range of applicability to networks with a few hundred nodes. 

The remainder of the paper is organized as follows. In the next section we provide 
necessary definitions and background. In Section 3 we reformulate the problem of 
modularity maximization as an equivalent convex continuous optimization problem over 
a compact afline section of the completely positive cone. In Section 4 we derive a 
semidefinite program yielding upper bounds on the maximal modularity. In Section 
5 we develop a randomized algorithm generating suboptimal partitions of the network 
into communities. In Section 6 we test the algorithm on several benchmark networks. 
In the last section we discuss the results and line out some suggestions for further 
improvements. 

2. Definitions and preliminaries 

By R kxl we will denote the space of real k x / matrices, and by S(n) the space of 
real symmetric n x n matrices. The space S{n) is equipped with an Euclidean scalar 
product, the Frobenius inner product (•, •) defined by (A,B) = tr(AB). By l k we 
denote a column vector of length k consisting of l's, and by lkxi = Ifclf a k x I matrix 
consisting of l's. For a matrix A, vec(A) will denote the vector obtained by stacking 
the columns of A. For matrices A, B, A® B will denote the Kronecker product of the 
matrices A, B, i.e. a matrix which is obtained from A by replacing each element A^ by 
the product AijB. By I n we denote the n x n identity matrix. For annxn matrix A, 
denote by diagA the vector consisting of the diagonal elements of A. 

A symmetric simplex in IR n is a polytope having n+1 vertices, such that each vertex 
is represented by a unit length vector and the scalar product of each pair of distinct 
vertices equals — K 

2.1. Modularity 

The modularity is a scalar function on the set of partitions of a network and was 
introduced in [1]. Let G be an undirected graph with n vertices, m edges and adjacency 
matrix A. To any partition of the vertex set into disjoint subsets, called communities, 
we associate a real number, called the modularity, defined by 



where Cj is the community of vertex % and h is the number of edges leaving vertex %. 
Note that ki is the i-th element of the vector Al n . Denote the symmetric n x n matrix 




with elements ^-(A 



2m 



) by B. We then have 



B = -±-JA.l T n Al n -Al n l 



T 



n 



A), 



(1) 



and the vector 1„ is in the kernel of B. 



Detecting community structure with convex optimization 



4 



For a particular partition, the modularity of the partition indicates how much the 
partition matches the community structure of the graph. To detect the community 
structure of the graph the modularity has to be maximized over all partitions. 

A partition of the vertex set into at most p subsets will be represented by a {0, 1}- 
matrix S of size n x p. Here each column corresponds to a community and each row 
to a vertex. The element SV,- is defined as the indicator function of membership of 
vertex % in the community j. Then Sl p = l n and the modularity of the partition is 
given by Q = tr(S T BS) = (B, SS T ). The problem of modularity maximization over all 
partitions into at most p communities hence becomes 

max (B,SS T ): S i3 e {0, 1} V i = 1, . . . , n, j = l,...,p, Sl p = l n .(2) 

5gR nxp 

2.2. Semidefinite programs 

The content of this subsection can be found in many recent standard books on convex 
optimization, e.g. [15], [13]. Any real symmetric matrix X G S(n) can be cosidered 
as a quadratic form on W 1 , defined by x i— > x T Xx. If the quadratic form defined by 
the matrix X is nonnegative everywhere on R n , we shall call the matrix X positive 
semidefinite (PSD), and write X y 0. The set of PSD matrices in S(n) forms a closed 
convex cone, which will be denoted by S + (n). By the Frobenius inner product linear 
functionals on S(n) can be identified with elements of S(n), namely by F(-) = (F, ■). 
A semidefinite program (SDP) is an optimization problem of the form 

min(F , X) : X G S+(n), {F i: X) = a, % = 1, . . . , N, (3) 

where F , . . . , Fn are given elements of S(n), and c±, . . . , cn are given real numbers. 
Such problems can be solved numerically in polynomial time. A list of SDP solvers can 
be found e.g. in [16]. We call X a feasible solution if it satisfies the constraints X y 0, 
(Fi,X) = Ci for all i, and an optimal solution if it in addition minimizes the objective 
function (F , X). 

It is not hard to see that if we replace some of the equalities in (3) by inequalities, 
we can still bring the problem back to the standard form, although with a larger 
matrix X. Also, by changing the sign of Fq we can convert maximization problems 
into minimization problems and vice versa. 

2.3. Completely positive cone 

A matrix X G S(n) is said to be completely positive (CP) if it can be factorized as 
X = CC T such that the factor C has only nonnegative elements. It is not hard to check 
that the set of CP matrices forms a convex cone in S(n), namely the convex conic hull 
of all PSD rank 1 matrices with nonnegative elements. A PSD matrix with nonnegative 
elements is also called doubly nonnegative matrix, or DNN matrix. It follows that every 
CP matrix is DNN. However, for n > 5 there exist DNN matrices which are not CP 
[17], [12], and the cone of DNN matrices is an overbounding approximation of the CP 
cone. 
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The following fundamental result by Burer [11] allows to transform a quadratic 
optimization problem like (2) to a completely positive program, i.e. a convex continuous 
optimization problem involving a constraint of membership in the CP cone. Consider 
the following optimization problem. 

mm x T Qx + 2c T x: Ax = b, Xi G {0, 1} V i G I C {l,...,ra}. (4) 

Here Q G S(n) and c G M. n . Thus we minimize a quadratic function over the positive 
orthant subject to linear and binary constraints. The linear constraints are given by 
some m x n coefficient matrix A and a right-hand side vector b G W 11 . The binary 
constraints are imposed on some subset I of elements of the search vector x. 

Theorem 1. [11, Theorem 3.1] Consider optimization problem (4). Assume that its 
feasible set is nonempty, that the conditions x G and Ax = b together imply X; L G [0, 1] 
for all i G I, and that there exists a vector y G M. m such that A T y G K" , b T y = 1. Then 
the optimal value of (4) is equal to the optimal value of the completely positive program 

min (Q, X) + 2c T XA T y : AXA T y = b, diag(AXA T ) = diag(66 T ), 

XeS(n) 

(XA T y)i = Xu V i G / C {!,...,«}, y T AXA T y — 1, X completely positive. 



3. Reformulation as completely positive program 

In this section we reformulate the problem of modularity maximization as an equivalent 
completely positive program. It is not hard to verify that problem (2) satisfies the 
assumptions of Theorem 1 with x = vec(S), y = ^l n , A T y = ^l np , Q = B, c = 0, 
I — {1, ... , np}. By this theorem, (2) is equivalent to the optimization problem 

V j V V 

x^s^ 3, 5^ Xii ^ n 5 Xijln = ln ' ? dia S X y = !«> 

i=l ij'=l »J=1 

1 1 V ^ rp 

— Xl np = diagX, X completely positive, > l_X i7 -l„ = 1. 
n ' 

Here X is an np x np matrix consisting of p x p blocks X^- of size nxn each. Note that the 
above completely positive program is invariant with respect to simultaneous permutation 
of the row and column indices of the blocks (with the same permutation). In other words, 
the feasible set of the program and its objective function do not change if one replaces 
X by the product (P £g> I n )X(P <S> I n ) T , where P is an arbitrary p x p permutation 
matrix. We can then group average the program with respect to the symmetric group 
S p , i.e. impose the additional condition that X = ( K P®I n )X(P®I n ) T for all permutation 
matrices P. Indeed, for any feasible matrix X of the original program, the group average 
p\ l^pes p (P®I'n)-X-{P®In) T will be feasible with the same value of the objective function 
and satisfy the additional invariance condition. We hence assume that all diagonal 
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blocks of X equal some matrix X D e S(n), and all off-diagonal blocks equal some 
matrix X Q G S(n). This leads to the symmetrized completely positive program 

p 1 

max (B,pX D ), -X s l n = l n , pdmgX s = l n , -X s l n = dmgX D , 

x D ,x eS(n) n n 

X is CP, ^l T n X s l n = 1, 

n 

where X$ = Xd + (p — 1)Xq. It is not hard to see that the conditions on X$ imply 
pX s = By defining X' = -^—^Xn — ^-l nxn the program further simplifies to 

v — 1 1 1 

max (- B, X'), diagX' = 1„, -l npxnp + (I p - -l pxp ) ® X' is CP. (5) 

X'eS(n) p P P 

We obtain the following result. 

Theorem 2. Consider a graph with n vertices and m edges with adjacency matrix A 
and let the matrix B be defined by (1). For every p e N 7 p > 2, the maximum of the 
modularity over the set of partitions of the vertex set of the graph in at most p subsets 
is given by the optimal value of the completely positive program (5). 

Recall that every CP matrix is DNN. The condition that -l npxnp + (I p — ^l pxp )^)X' 
has nonnegative elements amounts to the condition that the elements of X' are contained 
in the interval [— 1]- The condition ^l npxnp + {Ip — p"lpx P ) <S> X' >z is equivalent 
to the condition X' y 0, as the Kronecker product of PSD matrices is again PSD. 
Thus by relaxing the condition of membership in the CP cone in (5) to the condition 
of membership in the DNN cone we obtain a semidefmite program with a feasible set 
that overbounds that of the original completely positive program. The optimal value 
of this semidefinite program will hence overbound the maximal value of the modularity. 
However, we prefer a more intuitive geometrical derivation of this semidefinite program 
in the next section, because it allows to lay out a randomized algorithm for generating 
suboptimal partitions. 

4. Upper bound on the maximal modularity 

In this section we construct a semidefinite program whose solution yields an upper bound 
on the maximal modularity of a graph. The tools employed for this bear resemblance 
with the methods proposed in [14] for dealing with the so-called max-cut problem, and 
in fact can be viewed as their generalization. 

Recall that we represent partitions of the vertex set of the graph into at most p 
subsets by {0, l}-matrices S of size n x p. Each row of S is a standard orthonormal 
basis vector in M p . If the partition assigns the k-th vertex to the community /, then 
the k-th row s fc of the corresponding matrix S is the /-th basis vector e\. Thus all rows 
of S lie in the intersection of the unit sphere in W with the affine subspace given by 
the linear relation (s, l p ) = 1. Any row vector s in this intersection can be represented 

as a sum s = -l p + J^-v, where v is a unit length vector in the p — 1-dimensional 
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linear subspace given by (v, l p ) = 0. In this way we can write the rows of S as sums 
s fc = il p + \j^-j- v k, and the matrix S as ^l nxp + yJ^j^-V, where the rows of the matrix 
V are given by the vectors v k . 

Recall that the modularity Q of a partition is given by {B, SS T ), where B is a fixed 
real symmetric n x n matrix depending on the structure of the graph. We obtain 



p v p p \ p p 

because the vector l n is in the kernel of B. The matrix VV T G S+(n) is the Gram 
matrix of the vectors v k . Note that (v k ,vi) = 1 if the vertices k, I belong to the same 
community, and (vk,vi) = — if these vertices belong to different communities. Hence 
the Vk lie at the vertices of a symmetric simplex in the p — 1-dimensional linear subspace 
given by (v, l p ) = 0. The assignment of the Vk to the different vertices of the simplex 
corresponds to the assignment of the vertices of the graph to the different communities. 
We obtain the following theorem. 

Theorem 3. Consider a graph with n vertices and m edges with adjacency matrix A 
and let the matrix B be defined by (1). For every p e N, p > 2, the optimal value of the 
semidefinite program 

v-1 1 

max (- B,X): X y 0, diagX = 1„, X kl > V 1 < k < I < n (6) 

xes{n) p p — 1 

is an upper bound on the maximum of the modularity over the set of partitions of the 

vertex set of the graph in at most p subsets. In particular, the optimal value of the 

semidefinite program 

n — 1 1 
max ( B,X) : X y 0, diagX = l n , X kl > V 1 < k < I < n 

x&s(n) n n—l 

is an upper bound on the maximal modularity over the set of all partitions. 

Proof. Let S be the n x p {0, l}-matrix corresponding to the partition realizing the 
maximum of the modularity. Define a matrix V by S = ^l n x P + \J*^V- Then 
X = VV T is a feasible solution for the SDP (6), and the optimal value of the SDP 
is overbounding the scalar product Q = (^y-B, VV T ). □ 

Note that the condition X k[ < 1 for k ^ I is automatically satisfied by all feasible 
solutions of (6) due to the conditions X y and diagX = l n . 

It has to be stressed that only the optimal value of the semidefinite program (6) 
yields an upper bound to the considered maximum of the modularity function. An 
arbitrary feasible solution of (6) has no relation to this maximum. However, the theory 
of semidefinite programming allows to obtain upper bounds without actually solving 
the semidefinite program. 

Theorem 4. Assume the conditions of the previous theorem. Let Y e S(n) be a matrix 
with nonpositive off-diagonal elements satisfying Y — ^-B y 0. Then the quantity 

-zjtr Y 3j-l^yi n is an upper bound on the maximum of the modularity over the set 

of partitions of the vertex set of the graph in at most p subsets. 
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Proof. Assume the notations of the theorem. We will show that the quantity in question 
is an upper bound on the optimal value of the semidefinite program (6). Let X be a 
feasible solution to (6). Since the scalar product of two PSD matrices is nonnegative, 
we have 



1 X 

0<(X,Y- ^^B) =tiY + 2 , yX ij Y ij - (X, V -^— 
p p 

<trr + 2^(--l I )^-(X,^-l 

i<j 



B) 

V 

D) 



" trY-^-l T n Yl n -(X,^B). □ 



P — 1 p — 1 p 

Remark 1. To look for the best matrix Y in the previous theorem amounts to the 
semidefinite program 

min (^—I n -J—l nxn ,Y), YijKOVi^j, Y — ?—^-B y 0. 

YeS(n) p — 1 p — 1 p 

This is the dual program to (6) and it has the same optimal value [15]. Every feasible 
solution of the dual program yields an upper bound on the maximum of the modularity. 



4-1. Sharpening of the approximation 

In this subsection we propose an improved semidefinite approximation by imposing 
a generalization of the so-called metric inequalities [18], which are used to tighten 
relaxations for max-cut problems. 

Let us consider a symmetric simplex in MP -1 and a collection of vectors v\, . . . ,v n 
lying at the vertices of this simplex. For any three distinct vectors v iy Vj, we have three 
possibilities. Either all three vectors lie at the same vertex, or two vectors lie at the 
same vertex and the third at another vertex, or all three vectors lie at distinct vertices. 
Recall that the scalar product between two vectors is either — ^-j- or 1, depending on 
whether they lie at distinct vertices or not. It is not hard to see that among the scalar 
products (vi,Vj), (vj,Vk), and (vk,Vi) there cannot be exactly two equal to 1. Therefore 
the inequality (i>j, v 3 ) + (vj, Vk) — (vk, Vi) < 1 must hold for every triple of distinct indices 

k). Therefore we can add the conditions X i3 - + Xj^ — X^ < 1 to relaxation (6). We 
obtain the following semidefinite program. 

r)-l 1 

max (- B,X) : diagX = l n , X kl > V 1 < k < I < n, (7) 

xes+(n) p p — 1 

X i:j + X jk - X ki < 1 V i, j, k e {1, . . . , n}. 
A corresponding sharpened version of Theorem 3 then holds for this relaxation. 



5. Randomized algorithm generating suboptimal partitions 

In this section we utilize the geometrical interpretation of the semidefinite relaxation 
(6) to devise a randomized algorithm yielding good partitions of the graph. We will 
need the following lemma. 
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Lemma 1. Let p > 2 and let Vi, . . . ,v p be the vertices of a symmetric simplex in IR P_1 . 
Let further U be a random orthogonal transformation o/M p_1 drawn uniformly from the 
Lie group of orthogonal (p — 1) x (p — 1) matrices, and let wi, . . . , w p be the images of 
v±, . . . , v p under this transformation. Let W be the p x p matrix composed of the scalar 
products (vk,wi). Then almost surely, in every row ofW there will be a unique maximal 
element. In case of this event, the indices of the maximal elements of all rows form the 
set {l,...,p}. 

Proof. For p = 2 the lemma is evident. 

Let p > 3. Consider two distinct elements of the same row of the matrix W, say Wm 
and Wk m . Equality of these elements amounts to the linear condition (vi — v m ) T Uvk = 
on the elements of U. If the measure of the set of orthogonal matrices satisfying this 
condition is nonzero, then all tangent vectors to the orthogonal group at U must lie in 
the corresponding linear subspace. This amounts to the condition that Vk(vi — v m ) T U 
is a symmetric matrix, i.e. Uvu and vi — v m are collinear. This leads to a contradiction 
with the condition (vi — v m ) T Uvk = 0. This proves the first assertion of the lemma. 

Now suppose that every row of W has a unique maximal element, and that two 
rows k, I share the same index to corresponding to this maximal element. Consider the 
polyhedral cone K = {v \ (v,w m — w m i) > V m' ^ to}. It is not hard to check that 
this cone is the convex conic hull of the vectors —w m r, w! ^ to, and the minimal scalar 
product between two unit length vectors of this cone equals — ^y. This value is attained 
if and only if these unit length vectors equal — w mi , —w m2 for some toi,TO2 ^ to. On 
the other hand, we have (vk, w m — w m >) > 0, (vi, w m — w m >) > for all indices to' ^ m. 
Hence Vk, v\ lie in the interior of the cone K, and the scalar product of these unit length 
vectors equals the minimal value — This leads to a contradiction. □ 

Let X be the optimal solution of the semidefinite program (6). As outlined in the 
previous section, X can be interpreted as the Gram matrix of a collection of vectors Vk 
encoding the assignment of the vertices of the graph to different communities. These 
vectors can be defined as the rows of any factor F of the PSD matrix X, i.e. any n x r 
matrix F such that X = FF T , where r is the rank of X. 

Ideally, the rank of X does not exceed p — 1 and the elements of X take on the 
values — ^-j- and 1. In this case, the vectors Vk are vertices of a symmetric simplex in 
M p_1 . The assignment of the n vectors Vk to the p vertices of the simplex corresponds 
to the assignment of the n vertices of the graph to p communities. One can compute 
this assignment as follows. 

Construct the vertex set of an arbitrary symmetric simplex in MP, for example by 
factorizing the pxp rank p — 1 matrix ^zjip — l P x P and taking the rows of the factor. 
Subject the vertex set to a random orthogonal transformation of M p_1 drawn uniformly 
from the Lie group of orthogonal matrices, to obtain a collection of vectors wi, . . . , w p . 
Construct the n x p matrix W of scalar products (vk,wi). For k = l,...,n, assign 
the vertex k of the graph to the community whose index corresponds to the maximal 
element in the k-th row of W. 
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By Lemma 1, this procedure almost surely reproduces the original assignment 
defined by the vectors v^. By construction the modularity of the partition obtained 
this way will equal the optimal value of (6) and will hence be optimal by Theorem 3. 

However, in general the rank r of the solution matrix X exceeds p — 1 and there are 
elements of X which lie in the interior of the interval [— ^-j-, 1]. Therefore the collection 
of vectors Vk obtained from the factorization of X is a subset of the unit sphere in R r such 
that the scalar products between distinct vectors obey the inequality (vk,vi) > ~^-[- 
Nevertheless, the proximity of two vectors v% on the unit sphere can be considered 
as a measure of the tendency of the vertices k, I of the graph to belong to the same 
community. We propose to construct candidate partitions of the graph from the set 
{t>i, . . . , v n } by the following randomized algorithm, which is an adaptation of above 
construction. 

Algorithm 1. Solve the semidefinite program (6), factorize the optimal solution 
X = FF T to obtain a factor F of size nxr, where r is the rank of X. Define the vectors 
vi, . . . , v n as the rows of F. If p — 1 > r, then append the vectors Vk, k — 1, . . . , n with 
p — r — 1 zeros. Repeat the steps 

1. If p — 1 < r, then draw a random (p — l)-dimensional linear subspace of W from 
a uniform distribution and project the vectors Vk on this subspace to obtain vectors 
v' k G MP -1 , k — 1, . . . , n. Otherwise define v' k = v/~. 

2. Construct the vertex set of an arbitrary symmetric simplex in MP' 1 and subject 
it to a random orthogonal transformation of MP' 1 drawn from a uniform distribution, 
to obtain a collection of vectors wi, . . . , w p . 

3. Construct the n x p matrix W of scalar products (v' k ,wi). 

4. For k — 1, . . . , n, assign the vertex k of the graph to the community whose index 
corresponds to the maximal element in the fc-th row of W. 

5. Compute the modularity of the obtained partition. 

until the maximal value of the modularity obtained in the sequence of steps 5 makes 
no further progress. Output the partition that furnished the maximal value of the 
modularity. 

The stopping rule is formulated somewhat vaguely and can be concretized in many 
ways. For instance, one can stop if N > N and the last improvement of the modularity 
occured more than aN iterations ago, where N is the number of the current iteration and 
a G (0, 1), N G N are prespecified constants. Note also that the algorithm can output 
a partition that has strictly less than p communities, because there can be vertices of 
the symmetric simplex which no vector t> fe was assigned to. 

5.1. Partition in 2 communities 

For p = 2 both the algorithm proposed above and the semidefinite program (6) 
considerably simplify. Namely, the inequality constraints X ki > —1 in (6) are a 
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consequence of the conditions X y and diag A = l n . We thus obtain the semidefmite 
program 



Algorithm 1 simplifies to the following algorithm, because the two vertices of a 
symmetric simplex in R are collinear. 

Algorithm 2. Solve the semidefmite program (8), factorize the optimal solution 
X = FF T to obtain a factor F of size n x r, where r is the rank of X. Define the vectors 
v i, • • • , v n as the rows of F. Repeat the steps 

1. Draw uniformly a random vector w from the unit sphere in W. 

2. For k — 1, . . . , n, assign the vertex k of the graph to community 1 or community 
2 depending on whether the scalar product (vk,w) has positive or negative sign. 

3. Compute the modularity of the obtained partition. 

until the maximal value of the modularity obtained in the sequence of steps 3 makes 
no further progress. Output the partition that furnished the maximal value of the 
modularity. 

Relaxation (7) can also be improved in the case of two communities. Namely, for 
any collection {vt} of unit length vectors distributed among the vertices of a symmetric 
simplex in R we have the additional condition (v j, vf) + (vj,Vk) + (vk,Vi) > —1, because 
at least two vectors lie at the same vertex. Therefore, in addition to the conditions 
Xij + Xjk — Xki < 1, we can add the conditions A^ + Xjk + Xk% > — 1 to relaxation (8). 

Remark 2. For p = 2 the modularity maximization problem in its equivalent form (5) 
reduces to the maximization of the linear function \{B, ■) over the max-cut polytope [18]. 
The semidefinite relaxation (8) and Algorithm 2 are standard tools for obtaining upper 
bounds and suboptimal solutions for this problem and were proposed in [14]- 

6. Examples 

In this section we compute upper bounds on the maximal modularity and test the 
algorithms presented in the previous section on several benchmark networks used in the 
literature. 

6.1. Zachary karate club network 

The Zachary karate club network is a social network with 34 nodes studied in [19]. A 
split of this network in two communities was observed. In the table below we give upper 
bounds Qupper and achieved values of the modularity Q su bo P t together with the number 
of communities in the best partition for different values of p. 

For p > 4 the algorithm retuned the same partition in 4 communities. The 
partition for p = 2 is the same as the one obtained by Newman [4], and 




(8) 
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Table 1. Upper bounds and achieved values of the modularity for the karate club 
network. 

P Qupper Qsubopt # comm. 

2 0.376 4765 0.3717949 2 

3 0.420 4657 0.402 0381 3 

4 0.432 3106 0.419 7896 4 

5 0.435 3398 4 

6 0.436 5051 4 

7 0.4370969 4 
34 0.438 6004 



differs from the observed split by the assignment of one vertex. The partition 
obtained for p = 4 is given by {1,2,3,4,8,12,13,14,18,20,22}, {5,6,7,11,17}, 
{9, 10, 15, 16, 19, 21, 23, 27, 30, 31, 33, 34}, {24, 25, 26, 28, 29, 32}. It has a slightly larger 
modularity than the best one obtained so far in the literature: Duch and Arenas give a 
value of 0.4188 [5], Newman gives a value of 0.419 [9]. 

The solution of the semidefinite program (6) took approximately 1 sec for each 
value of p. 100 iterations of Algorithm 1 were sufficient to obtain the solution for each 
value of p, which took a fraction of a second of CPU time. 

For the karate club network we ran also the sharpened semidefinite program (7) for 
the values p = 2, 3, 34. For p = 2, the inequalities + Xjk + X^ > — 1 were added. 
For p = 3 the upper bound improved to 0.404 6578. For p = 34, the optimal solution to 
relaxation (7) is the one generated by above partition in 4 communities. For p = 2, the 
optimal solution is the one generated by above partition in 2 communities. This proves 
optimality of these partitions. In other words, the partition returned by the algorithm 
for p = 2 is the best one among all partitions in two communities, and the one returned 
for p = 4 is globally optimal among all partitions. 

The running time for solving (7) for each value of p was approximately 3 hours, for 
the augmented relaxation for p = 2 it was 5 h 15 min. 

6.2. Dolphin network 

The dolphin network is a social network with 62 nodes studied in [20] . In the table below 
we give upper bounds Q upP er and achieved values of the modularity Qsubopt together with 
the number of communities in the best partition for different values of p. 

For values p > 5 the algorithm returned partitions in 5 communities. 

The solution of the semidefinite program (6) took approximately 36 sec for each 
value of p. 100 iterations of Algorithm 1 were sufficient to obtain the solution for each 
value of p, which took a fraction of a second of CPU time. 
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Table 2. Upper bounds and achieved values of the modularity for the dolphin network. 

P Qupper Qsubopt # comm. 

2 0.4119486 0.402 7333 2 

3 0.515 4178 0.4941854 3 

4 0.5451018 0.526 7988 4 

5 0.549 8893 0.528 5194 5 

6 0.5516863 5 

7 0.552 6765 5 
62 0.555 2841 



6.3. Random graph 

In this subsection we present results on an artificial benchmark problem which was 
introduced in [1] and used in the literature to compare different algorithms for detecting 
community structure. We consider a graph with 128 vertices. The vertex set is 
artificially divided in 4 communities of 32 vertices each. We will refer to this partition 
as to the canonical partition. The edges of the graph are generated randomly and 
independently such that there are n out expected edges from each vertex to vertices in 
different canonical communities, and ni n expected edges to vertices within the same 
canonical community. These numbers are normalized to satisfy n out + Ui n = 16. 
The probability of presence of an edge between a given pair of vertices in different 
communities is hence p ou t = n out /96, and the probability of presence of an edge between 
a pair of vertices in the same community is p in = n in /31. The higher the value of 
n out , the more difficult it becomes to detect the community structure of the random 
graph. The performance of an algorithm is measured by the fraction of vertices that 
were assigned correctly as defined by the canonical partition, as a function of n out . 

The expected modularity of the canonical partition equals approximately 3/4 — 
n out /lQ. This value is empirically valid with an error of 5 • 10 -4 . The graph becomes 
completely random for n ou t = 1536/127 ~ 12.0945, namely when pi n = p out = 16/127. 

We tested Algorithm 1 for the values n out = 6, 7, 8, 9, 10, generating 10 random 
graphs for each value. For each random graph, we computed an upper bound on the 
modularity and ran Algorithm 1 for the values p = 4 and p — 5. For p = 5 the algorithm 
often returned a partition in 4 subsets, especially for lower values of n out . If it returned 
a partition in 5 subsets, then its modularity was almost always smaller than the best 
one obtained for p — 4. There were three exceptions, in one out of 10 graphs for the 
values n out = 7, 8, 10 each. In these cases, the vertices in the fifth community were all 
considered as incorrectly classified. 

For each value of n ou t, we estimated the mean and the standard deviation of the 
upper bound on the modularity, of the best modularity achieved by Algorithm 1, and 
of the fraction of correctly classified vertices, on the basis of the random sample of 10 
graphs. Thus the estimated error on the provided values of the mean equals one third 
of the listed standard deviations. 
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Table 3. Upper bounds, achieved values of the modularity, fraction of correctly 
classified vertices for random graphs. 



n ou t 


Qu PP er 


Qsubopt 


% correct vertices 


6 


0.3760 ±0.0104 


0.375 ±0.011 


99.06 ±0.62 


7 


0.3238 ± 0.009 


0.313 ±0.012 


96.8 ± 1.58 


8 


0.2915 ±0.006 


0.251 ±0.015 


85 ±5.47 


9 


0.2762 ± 0.004 


0.207 ±0.008 


56.02 ±9.08 


10 


0.268 ±0.005 


0.196 ±0.006 


42.27 ±4.09 



Table 4. Upper bounds and achieved values of the modularity for the Jazz musicians 
network. 

P Qupper Q subopt 4^ COmm. 

2 0.3541669 0.315 3506 2 

3 0.450 0446 0.4444694 3 

4 0.459 7013 0.4451041 4 

5 0.4614191 4 

6 0.4621197 4 

7 0.462 4950 4 
198 0.463 6604 



Based on the results of the comparative analysis presented in [10], our algorithm is 
outperformed only by the simulated annealing algorithm of [6] for n out < 8, and is the 
most accurate for the higher values of n out . 

The solution of the semidefinite program (6) took approximately 50 min for each 
random graph and each value of p. 10 6 iterations of Algorithm 1 were performed to 
obtain the solution for each value of p, which took approximately 1 hour of CPU time. 

6.4- Jazz musicians 

The Jazz musicians network is a social network with 198 nodes studied in [21]. In the 
table below we give upper bounds Q upP er and achieved values of the modularity Q su bopt 
together with the number of communities in the best partition for different values of p. 

The obtained modularity value is slightly smaller than that of the best known 
partition of this network, namely 0.4452 reported in [5]. 

The solution of the semidefinite program (6) took approximately 11 hours for each 
value of p. 10 6 iterations of Algorithm 1 were made to obtain the solution for each value 
of p, which took approximately 1 hour of CPU time. 

7. Conclusions 

In this paper we considered the problem of community detection in networks. A widely 
accepted approach to this problem is to maximize the modularity function over the set 
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of all partitions of the network. We studied the problem of modularity maximization 
from the viewpoint of convex analysis. Our contribution is threefold. 

Firstly, we reformulated the discrete problem of modularity maximization over 
the set of partitions as a convex continuous optimization problem (Theorem 2). This 
problem appears in the form of a completely positive program, that is the problem of 
optimizing a linear objective function over an affine section of the cone of completely 
positive matrices. This contribution is more of theoretical nature, because efficient 
algorithms to solve completely positive programs are not available due to their NP- 
hardness. 

Secondly, we relaxed the obtained completely positive program to a semidefinite 
program, for which efficient means of solution exist. This relaxation was achieved by 
a standard approach in convex optimization, namely the replacement of the cone of 
CP matrices by the overbounding cone of doubly nonnegative matrices. The optimal 
value of the semidefinite relaxation is an upper bound on the maximal achievable 
modularity. Our approach explicitly includes the possibility to limit the number of 
allowed communities in the partitions of the network, and so in fact yields a series of 
upper bounds indexed by the maximal number of communities. We provided a simple 
and intuitive geometrical interpretation of the semidefinite relaxation. These results are 
formalized in Theorem 3. 

Thirdly, we proposed a randomized algorithm to generate partitions of the network 
with suboptimal modularity. This algorithm uses the solution of the above-mentioned 
semidefinite relaxation as a starting point, so this semidefinite program must be solved 
in a preliminary step. The algorithm is described in Section 5 (Algorithm 1). 

If we consider only divisions of the network in two communities, the problem of 
modularity maximization becomes equivalent to the well-known problem of optimization 
of a linear objective function over the max-cut polytope. In this case both the 
semidefinite relaxation and the randomized algorithm considerably simplify. In fact, 
they reduce to the standard semidefinite optimization approach to the max-cut 
problem. We provided explicit descriptions of the simplified semidefinite relaxation 
and randomized algorithm in Subsection 5.1. 

We tested our approach on a number of standard benchmark problems. The 
proposed algorithm proved to be among the most accurate algorithms known to date, 
and in many cases it yields the best results known so far. For the karate club network, 
the improved relaxation (7) actually proves optimality of the obtained partition. 

A drawback of the algorithm is the high computational effort, and as a consequence 
of this, its limited range of applicability to networks with a few hundred nodes. However, 
with the development of available SDP solvers this range of applicability will increase. 

While the cost of solving the semidefinite relaxation is almost independent (with 
the exception of partitions in at most 2 communities) of the maximal number of allowed 
communities, the amount of required iterations of the randomized algorithm increases 
with this number. This is because the rank of the solution matrix of the semidefinite 
program increases more quickly than the number of allowed communities, and more 
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dimensions have to be projected out. But the projection step in the algorithm tends 
to destroy the information contained in the solution matrix, and more iterations of the 
algorithm have to be conducted to compensate for this loss. However, given the relatively 
small size of the networks that can be treated, the number of considered communities 
is also small, and this is not (yet) a critical issue. 

It can be observed that the computed upper bounds rapidly converge when 
increasing the number of allowed communities in the partition. In most cases the bounds 
are quite tight, with a relative error in the percent range between the bound and the 
achieved value of the modularity. For the random network with parameter value n out = 6 
the precision of the bound lies at astonishing 0.3% and is an order of magnitude lower 
than the standard deviation of the maximal modularity itself. 

There are several ways to improve the semidefinite relaxations (6) and (8) further, 
although at a higher computational cost. 

One approach is to pursue the path leading to relaxation (7). More inequalities can 
be obtained by considering subsets of 4 and more vectors. The resulting optimization 
problems are characterized by the presence of one matrix inequality constraint and a 
large number of linear inequality constraints. For this type of problems special solution 
methods exist [22] , which can treat larger problem instances than standard SDP solvers. 

Another approach consists in using tighter semidefinite relaxations for the 
completely positive cone (see e.g. [23]). 

The methods proposed in this contribution also work for graphs with weighted 
edges. Therefore it is imaginable to combine them with other modularity optimization 
methods to treat larger networks. One way would be to first use some hierachical 
clustering scheme to reduce the size of the network to several hundred nodes, and then 
to apply the algorithms proposed here to perform the final optimization. 
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